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Introduction 


• Hardware fault tolerance is the most 
_of the fault-tolerance areas. 

• Many techniques are extant. 

• The main drawback has been_. 

• As transistors become free,_ 


may be the new 
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2.1 The Rate of Hardware Failures - 

Component Failure Rates 


• Component failure rate 


- The_that a 

currently_component will suffer in a 

given_ 


Depends on 
1. 


2 . 


3 . 



Age (years) 


4 . _ 
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2.1 The Rate of Hardware Failures - 

Factors Involved in Component Failure Rates 

X — 71 ^71q(C^7Tj 7Tv ^-'2 ^e) 

X Failure rate of component 


7T L 

TCq 

71 t 

Ttv 

7l E 


factor 


factor 
_ factor 

_factor for CMOS 

factor 


Cl, C; 


factors 
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2.2 Failure Rate, Reliability and MTTF- 

Component Lifetimes 

• Consider a component that is operational at time t 


= 0 and remains operational until it is hit by a 
failure (_and_) 

- _is the_of the component 

- _is the_, represents the 

_probability of a failure at time t 

- _is the_, is 

the probability that the_will_ 

_, F(t) = Prob{T < t} 

- _is the_of a component, the 


probability that it will_, R(t) = 

Prob{T > t} = 1 - F(t) 


m=^i F(t)=[ f{T)dT 
dt J0 

Electrical and Computer Engineering *- _ 
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[ f(t)dt = 1 f{t) > Ofort > 0 

Page 5 


UAH Chapter 2 CPE 633 

2.2 Failure Rate, Reliability and MTTF- 

Component Reliability 


• F(t) represents the probability that a_component 

will fail_in the future. A more meaningful 

quantity is the probability that a good component of 

_will fail in the next_ 

This is a_probability, since we know 

the component survived_. 


m 


m = 

1 -Fit) 

We can put this in terms of reliability 

dR(t) d{ 1 - F{t)) - dF(t) 


dt 


dt 


dt 


-fit) 


• Solving for R (R(0) = 1) 

- R(t) = 

• f(t) = F(t) = 
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2.2 Failure Rate, Reliability and MTTF- 

MTTF of a Component 


• For an_ 

equal to its_ 

- MTTF = E[T] = 

- MTTF = 


Electrical and Computer Engineering ee= 


component, the_is 

_, E[T] 
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2.2 Failure Rate, Reliability and MTTF- 

Non-Constant Failure Rates 


• Although a_failure rate is used in most 

calculations of reliability, there are cases for which this 

simplifying assumption is_, especially 

during the_and_ 

phases of a component’s life. 

• In such cases, the_distribution is often 

used, which has two parameters,_and_, and has the 

following density function of the lifetime T of a 
component 

- f(t) 

- HU 

- R(t) 

- MTTF = 

- r(x) 
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2.3.1 Series and Parallel Systems - 

Series System Reliability 



• One of the most basic structures is the_ 

system shown. 

• A_system is defined as a set of N 

modules connected together so that the failure of 
_causes the entire system to fail. 

• If the failure of each module is_, 

the reliability of the system is 

• R s (t) = 

• If module i has a constant failure rate, X, 


• R s (t) = 

• MTTf s = 
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2.3.1 Series and Parallel Systems - 

Parallel System Reliability 


• The other most basic structure is the 
_system shown. 

• A_system is defined as a set of 

N modules connected together so that it 

requires the failure of_for 

the system to fail. 

• If the failure of each module is 

_, the reliability of the 

system is 

• R P (t) = 

• For two modules 

• R P (t) = 



• MTTF p = 
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2.3.2 Non-Series/Parallel Systems - 

Hybrid System Reliability 


Not all systems have a 


with 


structure. 


B 


T 


A 




E 

\ 


F \ 



C 

■" 1 i 

D 











Each path represents a_ 

allows the system to operate 

For example, the path_ 

operation if_ 


that 


means successful 


are fault-free. 


A path in such reliability diagrams is valid only if all 

modules and edges are traversed from_ 

_, for example,_is an 


invalid path in the example shown. 
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2.3.2 Non-Series/Parallel Systems - 

Expansion Around C, C Not Working 

• The diagram can be_until we have the 

_series or parallel forms. To do this, 

we rely on the_ 

- R system = Rj*Prob{system works|l is fault-free} + 
(1-Rj)*Prob{system works|l is faulty} 

• We pick one module to_, in this case, 

module 



(a) C not working 


For C not working, we 
have B and E in parallel 
with A and D, all in 
series with F. 


Prob{system works|C faulty} = 
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2.3.2 Non-Series/Parallel Systems - 

Expansion Around C, C Working 


• For C working, we still_simple parallel series 

combinations, so we must pick another module about which 
to_- Let’s try_- 




(b) C working 
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2.3.2 Non-Series/Parallel Systems - 

Expansion Around E, E Not Working 

• The diagram for E not working is shown, it has a 
_structure, the only path is_. 

D — p * p *p 

11 E not working ^A "V 

- R system = Rj*Prob{system works|l is fault-free} + 
(1-Rj)*Prob{system worksjl is faulty} 
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2.3.2 Non-Series/Parallel Systems - 

Expansion Around E, E Working 

The diagram for E working is shown. There are three 

paths,_,_and_. However, the_ 

path_the_path (if_and 

both working, the system works whether 
or not. 

- Re working = R F * *(1-R A )(1-R B ) 


works 


Putting it all together, 
Rsystem = 


A 


B 

V J 








Q 

\_ ) 
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2.3.2 Non-Series/Parallel Systems - 

Reliability Upper Bound 


• If the structure is too complicated for repeated application 

of the_, it is possible to calculate upper and 

lower_, rather than_values, for the 

reliabilities of the system. 

• An upper_is given by 

- Rsystem 

where R pat h I is the reliability of the series connection of the 
modules along path I. 

• This bound assumes that all the paths are_and 

that they are_. 

• Going back to our example, the paths are_,_, 

and_. 

- Rsystem = 


• The upper bound can be used to derive the_ 

reliability by replacing every occurrence of (Ri) k by R j5 
each module is used only once. 
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2.3.2 Non-Series/Parallel Systems - 

Reliability Lower Bound 


• A lower bound can be calculated based on_ 

_of the system diagram, where a_ 

_is a minimal list of modules such that the 

removal of_of the set will cause a 

_system to_- 

• The lower bound is obtained by 


■^system " 

where Q cut j is the probability that_ 

- Back to our example, where the 
_>_,_,_, and 


is faulty. 
_are 


_ R = 

■^system 

- We’d rather use the 


bound because we’d like to 


be 


_ about the reliability rather than 
and it’s to the exact value. 


Electrical and Computer Engineering 


Page 17 


UAH Chapter 2 CPE 633 

2.3.3 M-of-N Systems - 

Reliability 

• An_system is a system that consists of_ 

modules and needs at least_of them for proper 

operation, the system fails when_ 


modules are_. 

• The best-known_is the_, or_, 

system, in which there are_modules 

and a_. 

• Reliability of an_system 

“ R M_of_N(* *) = 

• The assumption that failures are_is 

_to the high reliability of_systems. 


^ M_of_n cor(^) " 


where is the probability that the entire system suffers a common 

failure. Page 18 
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2.3.3 M-of-N Systems - 

Triple Modular Redundant (TMR) Cluster 


• If a_voter is used, 

that voter becomes a_ 

point of failure and the 
reliability of the_ 


- R 


TMR 



is 


The general case of TMR is 
called 


redundancy (. 


) and is 


an M-of-N cluster with N odd 
and M = f N/21 
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2.3.3 M-of-N Systems - 

Comparing Reliabilities 



R 

• For_values of R(t), the_ the redundancy, 

the_the system reliability. As R(t)_, 

the advantages of redundancy become_; until 

for R(t) <_, redundancy actually becomes a 

_, with the_being the most reliable. 
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2.3.4 Voters 


• A voter receives inputs x 1 ,x 2 ,x N from an_ 

_and generates a representative_. 

• The simplest voter is one that does a_ 

comparison of the outputs and checks whether a 
_of the_inputs are_. 

• This approach is valid when there is_ 

_between all modules. 

• This_occurs when the modules are 

identical_, use identical_and 

identical_and have mutually_ 

clocks. 


• We declare two outputs x and y as_ 

_if |x - y| < 8 for some specified 8 . 

• There may also be_associated with each 

output. 
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2.3.5 Variations on NMR - 

Unit Level Redundancy 



• The voters are no longer as critical as in 


• A single faulty voter will cause_than a single 

faulty unit, and the effect of either one will not propagate 
beyond the_■ 
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2.3.5 Variations on NMR - 

Triplicated Processor/Memory System 



• Communication is_- 

• All communications go through_voting. 
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2.3.5 Variations on NMR - 

Dynamic Redundancy 


( Active ) 


( Spare-1 


* 

* 


Spare-Nj 


Fault 

Detection 

and 

Recon¬ 

figuration 

unit 


• Powered Spares 

— ^dynamic(^) " 

• Spares not Powered 


(t) = 


_ R 

"^dynamic 

- C= 

— ^dru " 
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2.3.5 Variations on NMR - 

Hybrid Redundancy 


Hybrid redundancy boosts_by adding_ 

_that will be used to replace active modules once 

they become_- 


* The outputs of the 
active primary modules 
are compared to identify 
a faulty primary, which 
is disconnected and 
replaced by a spare. 

* ^hybrid(^) 

* m = 

* RvoterW 

* R rec(*) 

Electrical and Computer Engineering 
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2.3.5 Variations on NMR - 

Hybrid Redundancy 


• Assumption was that 

any fault in the_ 

_» or 


will cause system 
failure. 

• In practice, not all 

these faults are_ 

• You’d have to know 

something about the 
various_ 
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2.3.5 Variations on NMR - 

Sift-Out Modular Redundancy 


As in_, all N modules in the Sift-Out Modular 

Redundancy scheme are_, and the system is 

operational as long as there are at least_ 

modules. 


• Instead of a majority 
voter, this system uses 


and 


circuit! 


• Faulty outputs, as 
identified by the 
and 


are not used in the 
collector which 


fault-free modules. 

• Exclude_by 

requiring disagreemeni 
for 
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2.3.6 Duplex Systems - 

Basics 


A duplex system, consisting of two processors and a 
comparator, is the simplest example of module 
redundancy.' 


• Both processors 
execute 


If the results are 

_, there is a 

_,and _ 



takes 


MTTF 


duplex 


over. 

p 

■^duplex 
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2.3.6 Duplex Systems - 

Faulty Processor Identification 


• Acceptance Tests 

- Example,_, is the output in an 

expected_ 

- What should the_be? 

• If it’s very_, all bad will be identified as bad, 

but some good may also be identified as bad. 

• If it’s very_, all good will be identified as 

good, but some bad may also be identified as bad. 

• The_is the conditional probability that 

the test_given that the output is 

actually_. 

• The_is the conditional probability 

that the output_given that the test 


• We want them both to be very_. 

Page 29 
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2.3.6 Duplex Systems - 

Faulty Processor Identification 

• Hardware Testing 

- Subject both processors to some hardware/logic 
test routines. 

- This approach works well as long as the fault is 
_, though it can still have escapes. 

• Forward Recovery 

- Use a third processor to repeat the 
computations. If only one of the three is faulty, 

then whichever processor the_ 

_with is the faulty one. 
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2.3.6 Duplex Systems - 

More Complicated Resilient Structures 


• Pair-and-Spare System 

- To avoid disruption of service, an_is 

disconnected and the_is transferred 

to a_- 

- The two members of the switched-out pair can now be 

tested offline to determine whether the fault was 
_or_- 


- In the case of a 
marked as a 
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2.3.6 Duplex Systems - 

More Complicated Resilient Structures 


• Triplex-Duplex System 

- Processors are tied together to form_, and 

then, a_is formed out of these_. 

- When the processors in a_disagree, both of 

them are_of the system 

- This arrangement allows for the_of voting 

combined with a simpler identification of_ 


- Furthermore, the_can continue to function 

even if only_is left functional, because 

the duplex arrangement allows the_ 
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2.4.1 Poisson Processes 


• Consider_events of some sort, 

occurring over time with the following_ 

behavior: For a time interval of very short length, At 

Pi (At) = 

P>i(At) = 

P„(At) = 

• Let N(t) denote the_occurring in an 

interval of length t, and let P k (t) = Prob{N(t)=k} be 

the probability of exactly_occurring during 

an interval of length t (k=0,1,2,...). 

P k (t + At) 

P 0 (t + At) 
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2.4.1 Poisson Processes 


• These approximations become more accurate as At 
0, and lead to the differential equations 


• Using the initial condition P 0 (0) = 1, the solution to 
this set of differential equations is 
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2.4.1 Poisson Processes 


• N(t) is a Poisson process with rate L 

- The expected_occurring in an interval 

of length t is It. 

- The length of time between_events is an 

exponentially distributed random variable with 
parameter X and mean value MX. 

- The number of events occurring in disjoint intervals of 

time are_of one another. 

- The sum of two independent Poisson processes with 
rates XI and X2 is itself a Poisson process with rate M 
+ X2 
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2.4.1 Poisson Processes - 

Duplex System 

• System consists of two_active processors 

with an unlimited number of_spares. 

- The two active processors are subject to failures 
occurring at a constant rate of X per processor. 

- As before , the coverage factor c is the probability of 

successful detection and_- assume 

comparator failure rate is negligible and_is 

instantaneous. 

- N(t), the number of failures that occur in_ 

_, is a Poisson process with the rate X. 

- M(t), the number of failures that occur in_, is 

a Poisson process with the rate 2X 

Prob{k failures in duplex} = Prob{M(t)=k} = 

Page 36 
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2.4.1 Poisson Processes - 

Duplex System Reliability Calculation 


• For the duplex system not to fail, each of these failures 

must be_and the processor_ 

_. The probability of one such success is c, and 

the probability that the system will survive k failures is 
c k . 

^duplexW = 


• The extension to the case with only a_set of 

spares requires capping the summation at the_ 
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2.4.1 Poisson Processes - 

Duplex System Reliability Reasoning 

• Individual processors fail at a rate X, and so 
processor failures occur in the duplex at the rate 
21 . 

• Each processor failure has a probability c of being 
successfully dealt with and a probability of (1 - c) of 
causing failure to the duplex. 

• As a result, failures that crash the duplex occur 
with rate 21(1 - c) 

• The reliability of the system is thus e' 21 * 1-0 )* 
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2.4.2 Markov Models 


• Markov models provide a_for the 

derivation of reliabilities of systems. 

• A Markov chain is a special type of_ 

_, X(t) - infinite number of random variables, 

indexed by t with a special_structure. 

• For X(t) to be a Markov chain, is future state must 

depend only on the_and not any_ 

_ ■ 

• If X(t) = i, the chain is in state i at time t. 

• We deal only with Markov Chains with_ 

time (0<t< oo ) and_state (X(t)=0,1,2,...) 
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2.4.2 Markov Models - 

Probabilistic Behavior 

• Once a Markov chain moves into some state i, it 
stays there for a length of time that has an 

_distribution with parameter, z^, implying 

a constant rate, A,j, of leaving state i. 

• Pjj is the probability that, when_state i, the 

chain will move to state j (with j * i) 

• The_rate from state i to state j, ^y, is thus 

/'ij = Pij / i- 

• Pj(t) is the probability that the process will be in 
state i at time t. 

- It was in state i and_during At 

- It was at some other state j and_during At 
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2.4.2 Markov Models - 

Probabilistic Behavior 


• We have 

- Pj(t+At) « P j0 (At) + P j1 (At) from all other states 

” Pio(At) = j Pjl(^) = 

- Pj(t+At) « 

dPi(t ) 
dt 


• Initial Conditions P jo (0) = 1 and Pj(0) = 0 for 
j * >o 
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2.5 Fault-Tolerance 
Processor-Level Techniques 


• _structures can be applied to a wide 

range of modules, from_to 

_, to_, etc. 

• In many cases, the overhead is_. 

• Another approach is execute every program 

_, using results only_. No 

hardware redundancy but severe time 
redundancy -_. 

• We could apply this at the_level. 

• Alternate scheme is_processor that 

monitors the behavior of the 
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2.5.1 Watchdog Processor 


• The watchdog processor monitors the 

_, looking mainly for proper 

program control. 

• The_must know what to expect. 

• This information is derived from the CFG, 
each node is a 
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2.5.1 Watchdog Processor - 

Assigned Signatures 


• Signatures correspond to_of the CFG, they can 

be_or_. 

• CFG and corresponding watchdog program with 
_signatures 

• Errors are not detected. 
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ace ept sig(Vl); 
either 

accept sig(V2); 
either 

accept sig(V3); 
or 

accept sig(V4); 
accept sig(V5); 
or 

accept sig(V5); Page46 
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2.5.1 Watchdog Processor - 

Calculated Signatures 

• Signatures can be calculated, for example, by_ 


• Watchdog holds 


calculated signatures 


• Still won’t detect data errors, could use_ 

supplement with other_schemes. 

accept k check sig(Vl); 



or 


accept k check sig(V2); 
either 

accept k check sig(V3): 

01 

accept k cheek sig(V4) ; 
accept k check sig(V5); 

accept k check sig(V5); 
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2.5.2 Simultaneous Multithreading for 

Fault Tolerance 


• If data and control dependencies limit the amount of 

_that can be extracted out of individual 

threads, allow the processor to execute_ 

_simultaneously. 

• _for simultaneous execution is 

required. 

• Each thread must have 


• For fault detection purposes, two_ 

threads are created for each original thread. 

• These threads execute the same code and receive 
the same inputs. 

• If they produce the same results,_, else 
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2.5.2 Simultaneous Multithreading for 

Fault Tolerance 


• To reduce the_of re-execution, 

one thread trails the other and takes advantage of 

the_, for example,_ 

_results. 

• For the two threads to be_, they must 

execute on different sets of_ 


• Items that are 


to be within the 
are outside it. 


for the two threads are said 
_, otherwise they 


Spfwrt nf Replication 



Electrical and Computer Engineering 


Page 49 


UAH Chapter 2 CPE 633 

2.6 Byzantine Failures 


• Byzantine failures are_failures, failures that 

are not obvious faults but that produce 


• If_has such a failure in a TMR, the 

other two will just_it. 

• However, when processors are_with no 

_entity, problems can ensue. 


•Consider a sensor providing 
information to two processors, it 
tells processor 1 25° and 
processor 2 45°. Each processor 
knows there is a problem but not 
which is right. 
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2.6 Byzantine Failures - Byzantine Generals 
(Interactive Consistency) Problem 

• One sender_an order to multiple 

receivers who can_about 

the value they received from the_- 

• A functional unit will be_in all its messages. 

• A faulty unit may behave_. 

• All communications have a_mechanism. 

• Interactive Consistency Conditions 

• IC1. All_units must arrive at an 

_of the value that was transmitted by the 

_ ■ 

• IC2. If the original source was_, the value 

they agree upon must be the value that was 
transmitted by the original source. 

Page 
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2.6 Byzantine Failures - 

Interactive Consistency Algorithm 

• Algorithm Byz(N, m). 

• N is the_(_and N-1_) 

• m is the number of_units 

• Interactive consistency is possible if_ 

• Pseudocode 

Source disseminates the information to N-1 receivers. 

If m > 0 then 

Each receiver runs Byz(N - 1, m -1) 

Each unit takes a vote over all messages received 
If majority 

Use majority 
Else 

Use default 

else 

each receiver uses value received from source 

Page 52 
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2.6 Byzantine Failures - 

1C Algorithm Notation & Degenerate Example 

• If A and B are units, then_means that A sent B 

the message n. 

• If U is a string of units A 19 A 2 , A m , and B is a unit, 

then_means that B received the message n 

from A m who claims to have received it from A m-1 and 
so on. 

• A message that is not sent is denoted by cp. For 
example, A.B(cp) means that the message that A was 
supposed to send B was never sent. 

• Example, degenerate case, m=0. The source sends to 
all receivers who use the value sent. 
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2.6 Byzantine Failures - 

1C Algorithm Example(m = 1) 

• Example, m = 1, need at least 4 units, S, R 15 R 2 , R 3 

• S is faulty, default — 1, IC(R 2 ,R 1 ) is value of R 1 as 
reported by R 2 

• Byz(4,1) 

S-Ri(1), S.R 2 (1), S.R 3 (0) 

Since m=1, 

R 1 runs(3,0) S.R.,R 2 (1), S.R 1 R 3 (1), 

IC(R 2 ,R 1 ) = 1, IC(R 3 , R 1 ) = 1 
R 2 runs(3,0) S.R 2 R 1 (1), S.R 2 R 3 (1), 

IC(Ri,R 2 ) = 1, IC(R 3 , R 2 ) = 1 
R 3 runs(3,0) S.R 3 R 1 (0), S.R 3 R 2 (0), 

IC(R.,,R 3 ) = 0, IC(R 2 , R 3 ) = 0 

ICV(R,) = (1,1,0), ICV(R 2 ) = (1,1,0), ICV(R 3 ) = (1,1,0) 

R 19 R 2 , and R 3 vote and get 1 
ICVfR^ is (source, R 2 reported by R 15 R 3 reported by R.,) 
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2.6 Byzantine Failures - 

1C Algorithm Example(m = 2) 


• Let N = 7, m = 2, S, R1, R2, R3, R4, R5, R6, R1 and R6 
are faulty 

• Byz(7,2) 

• S.R 1 (1), S.R 2 (1), S.R 3 (1), S.R 4 (1), S.R 5 (1), S.R 6 (1) 

• R1 Byz(6,1) 

• S.R 1 .R 2 (1), S.R 1 .R 3 (2), 8.R 1 .R 4 (3), S.R ia R 5 (4), S.R ia R 6 (0) 

• R2 Byz(5, 0) 

• S.R 1 .R 2 .R 3 (1), S.R 1 -R 2 -R 4 (1), S.R 1 .R 2 .R 5 (1), 
S.R ia R 2 .R 6 (1) 

• R3 Byz(5, 0) 

• S.R ia R 3 .R 2 (2), S.R ia R 3 .R 4 (2), S.R ia R 3 .R 5 {2), 
S.R ia R 3 .R 6 (2) 

• R4 Byz(5, 0) 

• S.R ia R 4 .R 2 (3), S.R ia R 4 .R 3 (3), S.R r R 4 .R 5 (3), 
S.R ia R 4 .R 6 (3) 
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2.6 Byzantine Failures - 

Another Algorithm Example 

• R5 Byz(5, 0) 

• S.R 1 .R 5 .R 3 (4), S.R 1 .R 5 -R 4 (4), S.R 1 .R s .R 5 (4), 
S.R 1 .R s .R 6 (4) 

• R6 Byz(5, 0) 

• S.R 1 .R 6 .R 2 (1), S.R 1 .R 6 .R 3 (8), S.R 1 .R 6 .R 4 (0), 
S.R 1 .R 6 .R 5 (p) 

• ,cv s.ri( R 2 > = (1, 2, 3, 4, 1) S.R 1 reported by R 2 = 0 

• icv s.ri( r 3 ) = (I. 2, 3, 4, 8) S.R 1 reported by R 3 = 0 

ICV S ri( r 4 ) = (1, 2, 3, 4, 0) S.R 1 reported by R 4 = 0 

• ,cv s.ri( r s) = (1> 2, 3, 4, 0) S.R 1 reported by R s = 0 

,cv s.ri( R 6> = (.»»») S.R! reported by R 6 = 0 

• ... 
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2.6.1 Byzantine Agreement with 
Message Authentication 

• Algorithm AByz(N, m). 

• Source_with xj/ and sends it out to 

each of the processors. 

• Each processor i that receives a_ xj/: A, 

where A is the set of_appended to the 

message , checks the_of signatures in A. If 

this number is less than_, it sends out xj/: A u 

{i} (what it received plus its own signature) to each 

of the processors_. It also adds this 

message, xp, to its list of_messages. 

• When a processor has seen the signatures of_ 

_ processor (or has timed out), it applies 

some_to select from among the 

messages it has received. 
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